How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

نویسندگان

  • Nicholas J Schurch
  • Pietá Schofield
  • Marek Gierliński
  • Christian Cole
  • Alexander Sherstnev
  • Vijender Singh
  • Nicola Wrobel
  • Karim Gharbi
  • Gordon G Simpson
  • Tom Owen-Hughes
  • Mark Blaxter
  • Geoffrey J Barton
چکیده

RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data. An RNA-seq experiment with 48 biological replicates in each of two conditions was performed to answer these questions and provide guidelines for experimental design. With three biological replicates, nine of the 11 tools evaluated found only 20%-40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. The same nine tools successfully control their false discovery rate at ≲5% for all numbers of replicates, while the remaining two tools fail to control their FDR adequately, particularly for low numbers of replicates. For future RNA-seq experiments, these results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes. If fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools. For higher replicate numbers, minimizing false positives is more important and DESeq marginally outperforms the other tools.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RNA-seq differential expression studies: more sequence or more replication?

MOTIVATION RNA-seq is replacing microarrays as the primary tool for gene expression studies. Many RNA-seq studies have used insufficient biological replicates, resulting in low statistical power and inefficient use of sequencing resources. RESULTS We show the explicit trade-off between more biological replicates and deeper sequencing in increasing power to detect differentially expressed (DE)...

متن کامل

Accurate differential gene expression analysis for RNA-Seq data without replicates

Despite the fact that many RNA-Seq experiments do not use biological replicates, most of the existing differential gene expression analysis for RNA-Seq data are designed to work with replicates. We present a novel method for differential gene expression analysis based on bootstrapping. We also discuss the use of different normalization methods with Fisher’s exact test. We compare the methods we...

متن کامل

S7 Table

Yes. RNA-seq can be used to quantify transcript levels from a sample. In order to perform useful statistics, one sample is insufficient. Replicates must be used to appropriately power such statistics. The RNA-seq method is an impressive advancement with many applications for studying RNA biology but it does not eliminate biological variability. If the input samples are heavily degraded or have ...

متن کامل

Bootstrap-based differential gene expression analysis for RNA-Seq data without replicates

Correspondence: [email protected] Computer Science & Engineering Department, University of Connecticut, 06269 Storrs, CT, USA Full list of author information is available at the end of the article †Equal contributor Abstract A major application of RNA-Seq is to perform differential gene expression analysis. Many tools exist to analyze differentially expressed genes in the presence of biologi...

متن کامل

Polyester: simulating RNA-seq datasets with differential transcript expression

MOTIVATION Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. RES...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • RNA

دوره 22 6  شماره 

صفحات  -

تاریخ انتشار 2016